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Prototyping scalable digital signal processing systems 
for radio astronomy using dataflow models 

N. Sane,^ J. Ford,^ A. I. Harris,^ and S. S. Bhattacharyya^ 

There is a growing trend toward using high-level tools for design and implementation of 
radio astronomy digital signal processing (DSP) systems. Such tools, for example, those 
from the Collaboration for Astronomy Signal Processing and Electronics Research 
(CASPER), are usually platform-specific, and lack high-level, platform- independent, 
portable, scalable application specifications. This limits the designer's ability to 
experiment with designs at a high-level of abstraction and early in the development cycle. 

We address some of these issues using a model-based design approach employing 
dataflow models. We demonstrate this approach by applying it to the design of a tunable 
digital downconverter (TDD) used for narrow-bandwidth spectroscopy. Our design is 
targeted toward an FPGA platform, called the Interconnect Break-out Board {IBOB), that 
is available from the CASPER. We use the term TDD to refer to a digital downconverter 
for which the decmation factor and center frequency can be reconfigured without the need 
for regenerating the hardware code. Such a design is currently not available in the 
CASPER DSP library. 

The work presented in this paper focuses on two aspects. Firstly, we introduce and 
demonstrate a dataflow-based design approach using the dataflow interchange format 
(DIF) tool for high-level application specification, and we integrate this approach with the 
CASPER tool flow. Secondly, we explore the trade-off between the flexibility of TDD 
designs and the low hardware cost of fixed-configuration digital downconverter (FDD) 
designs that use the available CASPER DSP library. We further explore this trade-off in 
the context of a two-stage downconversion scheme employing a combination of TDD or 
FDD designs. 



1. Introduction 

Key challenges in designing digital signal process- 
ing (DSP) systems employed in the field of radio as- 
tronomy arise from the need to process very large 
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amounts of data at very high rates arriving from one 
or more telescopes. It is also desirable to have scal- 
able and reconfigurable designs for shorter develop- 
ment cycles and faster deployment. Moreover, these 
designs should be portable to different platforms to 
keep up with advances in new hardware technologies. 
However, conventional design methodologies for sig- 
nal processing systems in the field of radio astronomy 
focus on custom designs that are platform-specific. 
Such designs, by virtue of being platform-specific, 
are highly specialized, and thus difficult to retar- 
get. Traditional design approaches also lack high- 
level platform-independent application specifications 
that can be experimented with, and later ported to 
and optimized for various target platforms. This lim- 
its the scalability, reconfigurability, portability, and 
evolvability across varying requirements and plat- 
forms of such DSP systems. 

A model based approach for design and imple- 
mentation of a DSP system can effectively exploit 
the semantics of the underlying models of compu- 
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tation. This facilitates precise estimation and op- 
timization of system performance and resource re- 
quirements (e.g., sec [Bhattacharyya et ai, 2010]). 
Though approaches for scalable and reconfigurable 
design based on modular field programmable gate 
array (FPGA) hardware and software libraries have 
been developed (e.g., see [Parsons et al, 2005, 2006; 
Szomoru, 2011; Nallatech website; Lyrtech website]), 
they do not provide forms of high-level abstraction 
that are linked to formal models of computation. 

We propose an approach using DSP-oriented 
dataflow models of computation to address some of 
these issues [Lee and Messerschmitt, 1987]. Dataflow 
modeling is extensively used in developing embedded 
systems for signal processing and communication ap- 
plications, and electronic design automation [Bhat- 
tacharyya et al., 2010]. Our design methodology in- 
volves specifying the application in the dataflow in- 
terchange format (DIF) [Hsu et al., 2005] using an 
appropriate dataflow model. This application speci- 
fication is transformed into an intermediate, graph- 
ical representation, which can be further processed 
using graph transformations. 

The DIF tool allows designers to verify the func- 
tional correctness of the application, estimate re- 
source requirements, and experiment with various 
dataflow graph transformations, which help to an- 
alyze or optimize the design in terms of specific ob- 
jectives. The DIF-based dataflow specification is 
then used as a reference while developing a platform- 
specific implementation. We show how formal under- 
standing of the dataflow behavior from the software 
prototype allows more efficient prototyping and ex- 
perimentation at a much earlier stage in the design 
cycle compared to conventional design approaches. 

We demonstrate our approach using the design 
of a tunable digital downconverter (TDD) that al- 
lows fine-grain spectroscopy on narrow-band signals. 
A primary motivation behind a TDD design is to 
support changes to the targeted downsampling ratio 
without requiring regeneration of the corresponding 
hardware code. Development of such a TDD is a 
significant contribution of this work. We compare 
our TDD with the fixed-configuration digital down- 
converter (FDD) designs that use the current DSP 
library from the Collaboration for Astronomy Sig- 
nal Processing and Electronics Research (CASPER) 
(see [CASPER Website]). We explore trade-offs be- 
tween the flexibility offered by TDD designs and their 
hardware cost. A TDD is particularly useful since 



our target FPGA hardware platform — interconnect 
break-out board (IBOB) [Parsons et al., 2006] — does 
not have the feature of storing more than one con- 
figurations (also referred to as "personalities") and 
dynamically loading one of them, unlike some of the 
CASPER hardware platforms of a later generation. 
A single reconfigurable TDD design also simplifies 
code management when compared to multiple static 
designs. 

We must emphasize that this paper describes a 
dataflow-based design flow for prototyping radio as- 
tronomy DSP systems. This approach is not re- 
stricted to any particular tool or hardware plat- 
form. We intend to demonstrate it by developing 
a high-level DIF prototype that uses dataflow for- 
malisms and generating a hardware implementation 
using CASPER tools from this DIF prototype. The 
proposed approach is not intended to replace the 
CASPER tools. It offers enhancements to the ex- 
isting CASPER design flow. However, this does not 
restrict its use to only the CASPER tools. 

The organization of the rest of this paper is as 
follows. Section 2 describes a TDD application. Sec- 
tion 3.1 describes dataflow modeling in detail, along 
with some of the relevant forms of dataflow {dataflow 
models) that are employed in practice. A reader who 
is familiar with dataflow formalisms may skip this 
section. Section 3.2 provides information about the 
DIF tool, while Section 3.3 highlights some of the 
relevant prior work. Section 4 explains how a DIF 
prototype can be used to develop a hardware imple- 
mentation. Section 5 provides a summary and our 
conclusions. 

2. Tunable Digital Downconverter 

In the DSP literature, the terms downsampling 
and decimation are often used interchangeably. In 
this paper, a decimator refers to a block that simply 
decimates or downsamples the input signal without 
any other processing (e.g., see Fig. 1(a) and (b)). 
The ratio of the sampling rate at the input of a dec- 
imator to that at its output is referred to as its deci- 
mation factor. A decimator is generally preceded by 
an anti-aliasing filter [Vaidyanathan, 1990]. In this 
paper, we refer to such a combined structure, consist- 
ing of a filter and decimator, as a decimation filter 
(e.g., see Fig. 2(a) and (b)). In a polyphase imple- 
mentation of a decimation filter, such as the one we 
use in our implementation, this structure is imple- 
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merited as a single computing block [Vaidyanathan, 
1990]. We refer to the system or application that em- 
ploys a dccimator or decimation filter, possibly with 
other blocks such as mixers and filters, as a digital 
downconverter, and in particular, a FDD or TDD 
(e.g., sec Fig. 3 and Fig. 4). The decimation factor 
of a decimation filter, TDD, or FDD refers to that of 
the decimator in it. 

Fig. 3 shows a block diagram of a TDD appli- 
cation. An 8-bit analog-to-digital converter (ADC) 
receives a baseband input IF signal of bandwidth 
800 MHz and samples it at the sampling rate of 
1.6giga-samples/second (GS/s). The internal de- 
sign of the ADC block is such that 8 consecutive 
time samples, where each sample is an 8-bit fixed 
point number, are output on the eight 8-bit buses 
at the same clock pulse. This results in 200mega- 
samples/sccond (MS/s) on each of the outputs of 
the ADC block. Correspondingly, all the down- 
stream blocks also have 8 input and output ports. 
Thus, there are 8 connections between any two blocks 
shown in Fig. 3 that are directly connected. We have 
not shown all 8 connections in detail for the sake of 
clarity and simplicity. 

The TDD subsystem, identified by the dotted box 
in Fig. 3, extracts a subband of the input signal 
with a user-specified center frequency (C/) and band- 
width (iJu,), downconverts it to a baseband, and then 
downsamples it to the Nyquist rate. For example. 
Fig. 5 shows two of the possible configurations of 
By^ and C/ and the corresponding frequency bands 
that are extracted. The output of the TDD can be 
used by the downstream DSP blocks. For example, 
a possible scheme can have a TDD implementation 
on the IBOB. The downstream DSP blocks may in- 
clude functions such as polyphase filtering and fast 
Fourier transform. These blocks can be implemented 
on a different hardware. This is possible using a com- 
munication link between two hardware boards that 
behaves as a FIFO buffer. An Ethernet link using 
lOx auxiliary user interface (XAUI) ports available 
on the IBOB is an example of such a link. 

During narrow-band observations, the Nyquist 
sampled output of the TDD will be analyzed with an 
existing spectrometer. The same number of spectral 
channels will thus provide proportionately greater 
spectral resolution as compared to analyzing the en- 
tire input bandwidth. Our TDD design supports 
integer decimation factors between 5 and 12. The 
choice of these values stems purely from the initial 
specification of the Green Bank Ultimate Pulsar Pro- 



cessing Instrument (GUPPI) [Ford and Ray, 2010]. 
This should be considered simply as a demonstra- 
tive implementation. The approach presented in this 
paper does not restrict the design in any way from 
having different specifications. The valid values of 
Cf corresponding to the selected can vary so as 
to span the entire 800 MHz IF input. 

As shown in Fig. 3, the TDD includes a tunable 
finite impulse response (FIR) filter. If the desired 
output is a baseband signal, then the FIR filter sim- 
ply acts as a low-pass filter. Also, in this case, the 
fork (which can be viewed as a dataflow version of 
a signal splitting block) and select (which is simi- 
lar to a multiplexer) blocks are configured to route 
the output of the FIR filter directly to the tunable 
decimation filter (TDF), bypassing the mixer. 

If the desired output is not a baseband signal, the 
FIR filter acts as a bandpass filter (BPF). The cut-off 
frequencies for this BPF are set using the specified 
parameter configuration {Byj and Cf). In this case, 
the output of the BPF is fed to a real mixer, which 
translates it into a baseband signal. The local oscil- 
lator, with a frequency /lO) is implemented as a nu- 
merically controlled oscillator (NCO). The frequency, 
/lO) is dependent on the value of C/ and Bw The 
output of the mixer is then fed to the TDF, which 
downsamples its input depending upon the specified 
Byj or decimation factor. We have used this scheme 
in order to have a real- valued TDF output. 

Such a TDD, which was originally designed for 
the GUPPI at the National Radio Astronomy Ob- 
servatory (NRAO), Green Bank, finds its use in the 
spectrometers currently under development for the 
Green Bank telescope (GBT) and 20m telescope at 
the NRAO, Green Bank. 

3. Background 

3.1. Dataflow Modeling 

Datafiow modeling involves representing an appli- 
cation using a directed graph G{V,E), where V is 
a set of vertices (nodes) and E is a set of edges. 
Each vertex u G V in & dataflow graph is called 
an actor, and represents a specific computational 
block, while each directed edge {u, v) G E repre- 
sents a first-in-first-out (FIFO) buffer that provides 
a communication link between the source actor u 
and the sink actor v. A dataflow graph edge e can 
also have a non- negative integer delay, del(e), asso- 
ciated with it, which represents the number of initial 
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data values (tokens) present in the associated buffer. 
Dataflow graphs operate based on data-driven exe- 
cution, where an actor can be executed (fired) when- 
ever it has sufficient amounts of data (numbers of 
"samples" or data "tokens") available on all of its 
inputs. Typically, in DSP-oriented data flow design 
environments, the execution of a dataflow graph can 
be thought of as that of a "globally asynchronous 
locally synchronous" (GALS) system [Suhaib et al., 
2008; Shen and Bhattacharyya, 2009]. 

During each flring, an actor consumes a certain 
number of tokens from each input and produces a cer- 
tain number of tokens on each output. When these 
numbers are constant (over all flrings), we refer to 
the actor as a synchronous dataflow (SDF) actor [Lee 
and Messerschmitt, 1987]. For an SDF actor, the 
numbers of tokens consumed and produced in each 
actor execution are referred to as the; consumption 
rate and production rate of the associated input and 
output, respectively. If the source and sink actors 
of a dataflow graph edge are SDF actors, then the 
edge is referred to as an SDF edge, and if a dataflow 
graph consists of only SDF actors, and SDF edges, 
the graph is referred to as an SDF graph. 

For a dataflow graph edge e, src(e) and snk(e), de- 
note its source and sink actors, and if e is an SDF 
edge, then prd(e) denotes the production rate of the 
output port of src(e) that is connected to e, and sim- 
ilarly, cns(e) denotes the consumption rate of the in- 
put port of snk(e) that is connected to e. 

A static schedule for a dataflow graph G is a se- 
quence of actors in G that represents the order in 
which actors are fired during an execution of G. 

Usually, production and consumption information 
— in particular, the number of tokens produced and 
consumed (production/consumption volume) — by 
individual flrings is characterized in terms of indi- 
vidual input and output ports so that each port 
of an actor can in general have a different produc- 
tion or consumption volume characterization. Such 
characterizations can involve constant values as in 
SDF [Lee and Messerschmitt, 1987] (as described 
above); periodic patterns of constant values, as in 
cyclo-static dataflow (CSDF) [Bilsen et al, 1996); or 
more complex forms that arc data-dependent (e.g., 
see [Buck, 1993; Bhattacharya and Bhattacharyya, 
2000; MuHhy and Lee, 2002; McAllister et al., 2004; 
Plishker et al., 2008]). A meta- modeling technique 
called parameterized dataflow (PDF) allows limited 
forms of dynamic behavior [Bhattacharya and Bhat- 



tacharyya, 2000] in terms of run-time changes to 
dataflow graph parameters. The Boolean dataflow 
(BDF) [Buck, 1993] and core functional dataflow 
(CFDF) [Plishker et al, 2008] models are highly ex- 
pressive (Turing complete) dynamic dataflow mod- 
els. We have explained SDF, CSDF, and PDF mod- 
els in greater detail later in this section. 

Apart from DIF, which we have mentioned ear- 
lier, there arc various existing design tools with their 
semantic foundations in dataflow modeling, such 
as Ptolemy [Pino et al, 1995], Lab VIEW [John- 
son, 1997], Strcamit [Thies et al, 2002], CAL [Eker 
and Janneck, 2003], PeaCE [Kwon et al, 2004], 
Compaan/Laura [Stefanov et al, 2004], and Sys- 
tcMoc [Haubelt et al, 2007]. Dataflow-oriented 
DSP design tools typically allow high-level appli- 
cation speciflcation, software simulation, and possi- 
bly synthesis for hardware or software implementa- 
tion [Bhattacharyya et al, 2010]. 

3.1.1. Synchronous Dataflow 

An SDF graph is characterized by its compile-time 
predictability through the statically known consump- 
tion and production rates, as deflned above. Fig. 6 
shows a simple SDF graph having actors W, X, Y, and Z 
(shown as circles or vertices of the graph). Each edge 
(an arrow in the flgure connecting a pair of actors) 
is annotated with the number of tokens produced on 
it by the source actor and that consumed from it by 
the sink actor during every invocation of the source 
and sink actors, respectively. For example, actor X 
can be flred when there are at least two tokens on 
its input. Whenever actor X is flred, it consumes two 
tokens from its input buffer, and produces three to- 
kens onto the output buffer connected to Y and two 
tokens onto the output buffer connected to Z. 

3.1.2. Cyclo-static Dataflow 

Many signal processing applications involve be- 
haviors in which production and consumption rates 
may change during run-tinic;. In some cases, these 
changes may, however, be known at compile-time. 
For example, consider the CSDF graph shown in 
Fig. 1(a), which has a decimator actor M in it. This 
actor consumes one token from its input on each in- 
vocation, but produces a token onto its output only 
on every fourth invocation. This behavior has been 
depicted using the varying production volumes de- 
noted by [10 0]. The nimibers of tokens produced 
by the decimator M follow this cyclic pattern with 
a period of 4. This sequence of varying produc- 
tion volumes, though not leading to constant output 
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rates like an SDF actor, is still completely determin- 
istic and known at the compile-time. This kind of 
dataflow behavior, where actors exhibit token pro- 
duction and consumption volumes (in terms of to- 
kens per firing on specific actor ports) that are either 
constant or expressible as cyclic sequences of con- 
stant volumes, is referred to as CSDF. Thus, CSDF 
can be viewed as a generalization of SDF in which 
token production and consumption volumes may be 
different across different firings of an actor, but fol- 
low cyclic patterns that are completely specified at 
the compilc-time. 

We refer readers to [Bilsen et al., 1996] for more 
details on the CSDF model. As shown in Fig. 1(a) 
and Fig. 1(b). it may be possible to transform a 
CSDF actor into an SDF actor. In general, when 
feedback loops are present in a dataflow graph, such 
a transformation may introduce deadlock, and there- 
fore should be attempted with caution. Such a trans- 
formation, when admissible (not leading to dead- 
lock), generally has trade-offs in terms of relevant 
metrics including latency, throughput, and code size. 
More detailed comparisons between the SDF and 
CSDF models of computation are presented in [Parks 
et al., 1995] and [Bhattacharyya et al., 2000]. 
3.1.3. Parameterized Dataflow 

Though CSDF provides enhanced expressive 
power compared to SDF, it is still unable to spec- 
ify patterns in token consumption and production 
volumes that are not fully known at compile time. A 
meta-modeling technique called PDF has been pro- 
posed to represent certain kinds of dataflow appli- 
cation dynamics [Bhattacharya and Bhattacharyya, 
2000]. This model can be used with any arbi- 
trary dataflow graph format that has a well-defined 
notion of a schedule iteration. For example, the 
PDF meta-model, when combined with an under- 
lying SDF model, results in the PSDF (parameter- 
ized synchronous dataflow) model. A PSDF graph 
behaves like an SDF graph during one schedule iter- 
ation, but can assume different configurations across 
different schedule iterations. 

The PDF meta-model supports semantic and syn- 
tactic hierarchy. Syntactic hierarchy is used, as in 
other forms of dataflow, to decompose complex de- 
signs in terms of smaller components. On the other 
hand, semantic hierarchy in PDF is used to apply 
specific features in the meta-model that arc associ- 
ated with dynamic parameter reconfiguration. A hi- 
erarchical actor that encapsulates such semantic hi- 



erarchy in PDF is called a PDF subsystem. A PDF 
subsystem in turn has three underlying graphs called 
the init, subinit, and &orf?/ graphs, which interact with 
each other in structured ways. Intuitively, the init 
and subinit graphs can capture data-dependent, dy- 
namic behavior at certain points during the execu- 
tion of the graph and configure the body graph to 
adapt in useful ways to such dynamics. Intuitively, 
the init graph is designed to capture parameter con- 
figuration that is driven by higher, system-level pro- 
cessing, while the subinit graph is designed to cap- 
ture the parameter changes occurring across different 
iterations of the corresponding body graph. The init 
graph can be used to dynamically configure parame- 
ters in the subinit graph, which, in general, executes 
more frequently relative to the init graph. 

To further illustrate the PDF modeling tech- 
nique, wc consider the application example shown 
in Fig. 2(a). This example involves an FIR fil- 
ter with filter taps or coefficients given by Cn = 
[cq,ci, . . . , ctv-i] followed by a dccimator with a tun- 
able decimation factor of D. The values of D and 
Cn are set either through a higher level system or 
user interface. Wc skip the details of this mecha- 
nism for the sake of simplicity and conciseness. Such 
behavior can be modeled using PDF with an under- 
lying CSDF model. Such a modeling approach is 
referred to as the parameterized cyclo-static dataflow 
{PCSDF) model [Saha et al, 2006]. Fig. 2(b) shows 
one of the possible PCSDF graphs corresponding to 
the application shown in Fig. 2(a). The subsys- 
tem DF is a PCSDF subsystem with its component 
graphs as shown in the figure. It can be seen here 
that the control actor in the DF . init graph of DF 
subsystem sets the required external and internal pa- 
rameters, D, and Cn, respectively. This actor mod- 
els the required parameter control through either a 
higher level system or some form of user interface. In 
this particular case, the DF . subinit graph is empty 
(in general, the init, subinit and body graph do not 
all have to be used for a given subsystem). 

The PCSDF model allows CSDF actors for which 
the cyclic patterns of token production and consump- 
tion volumes can be parameterized in terms of their 
periods, the actual numbers of tokens consumed or 
produced in the cyclo-static sequences, or both. In- 
tuitively, for a given configuration of application pa- 
rameters, a PCSDF graph behaves as a CSDF graph. 
However, a PCSDF graph not only models all pos- 
sible parameter configurations in a given application 



6 



SANE ET AL.: DATAFLOW MODELS FOR RADIO ASTRONOMY DSP 



but also describes how they can be changed at run- 
time. 

Such a model is of particular interest for mod- 
eling multirate DSP systems that exhibit parame- 
terizable sample rate conversions. PCSDF allows 
designers to systematically explore design spaces 
across static, quasi-static, and dynamic implemen- 
tation techniques. Here, by quasi-static implementa- 
tion techniqiics, we mean techniques where relatively 
large portions of the associated software or hardware 
structures are fixed at compile-time with minor ad- 
justments allowed at run-time (e.g., in response to 
changes in input data or operating conditions). A 
variety of quasi-static dataflow techniques are dis- 
cussed, for example, in [Bhattacharyya et al, 2010]. 

3.2. The Dataflow Interchange Format 

To describe dataflow applications for a wide range 
of DSP applications, application developers can use 
the DIP language, which is a standard language 
founded in dataflow semantics and tailored for DSP 
system design [Hsu et al., 2005]. DIF provides an in- 
tegrated set of syntactic and semantic features that 
help promote high-level modeling, analysis, and op- 
timization of DSP applications and their implemen- 
tations without over-specification. Prom a dataflow 
point of view, DIF is designed to describe mixed- 
grain graph topologies and hierarchies as well as to 
specify dataflow-related and actor-specific informa- 
tion. The dataflow semantic specification is based 
on dataflow modeling theory and independent of any 
design tool. 

Fig. 7 illustrates some of the available constructs 
in the DIF language along with the syntax used 
for application speciflcation. More details on the 
DIF language can be found in [Hsu et al. 2007]. 
The topology block of the speciflcation specifles the 
graph topology, which includes all of the nodes and 
edges in the graph. DIF supports built-in aitributes 
such as interface, refinement, parameter, and 
actor, which identify speciflcations related to graph 
interfaces, hierarchical subsystems, dataflow parame- 
ters, and actor conflgurations, respectively. DIF also 
allows user-defined attributes, which have a similar 
syntax as built-in attributes c;xec!pt that they need 
to be declared with the attribute keyword. 

The DIF language has been recently augmented 
with constructs for supporting topological pat- 
terns [Sane et al, 2010]. Topological patterns allow 
concise speciflcation of functional structures at the 



dataflow graph (inter-actor) level. They can effec- 
tively represent many of the flowgraph substructures 
that are pervasive in the DSP application domain 
(e.g. chain, ring, butterfly, etc.) to generate com- 
pact, scalable application representations. We direct 
readers to [Sane et al., 2010, 2011] for more informa- 
tion on the concept of topological patterns and how 
the DIF supports it. 

To facilitate use of the DIF language, the DIF 
package (TDP) has been built (see Fig. 8). Along 
with the ability to transform DIF descriptions into 
manipulable internal representations, TDP contains 
graph utilities, optimization engines, veriflcation 
techniques, a comprehensive functional simulation 
framework, and a software synthesis framework for 
generating C code [Hsu et al., 2005; Plishker et al., 
2008]. These facilities make TDP an effective en- 
vironment for modeling dataflow applications, pro- 
viding interoperability with other design environ- 
ments, and developing and experimenting with new 
tools and dataflow techniques. Beyond these fea- 
tures, DIF is also suitable as a design environment 
for implementing dataflow-based application repre- 
sentations. Describing an application graph is done 
by listing nodes (actors) and edges, and then anno- 
tating dataflow specific information as well as other 
(non-dataflow) kinds of relevant information associ- 
ated with actors, edges, and design subsystems. 

The framework in DIF for simulation and func- 
tional verification of applications, which is based on 
CFDF semantics, allows application specifications in 
DIF to be used as executable references for rapid 
system prototyping and developing further platform- 
specific implementations. CFDF, which supports dy- 
namic dataflow behaviors, allows flexible and efficient 
prototyping of dataflow-based application represen- 
tations, and permits natural description of both dy- 
namic and static dataflow actors. More information 
on CFDF semantics can be found in [Plishker et al., 
2008]. 

3.3. Related Work 

There exist high-end reusable, modular, scal- 
able, and reconfigurable FPGA platforms such as 
the Berkeley Emulation Engine 2 {BEE2) [Chang 
et al, 2005], IBOB [Parsons et al, 2006], and Uni- 
Board [Szomoru, 2011], which have been introduced 
specifically for DSP systems. These have been widely 
used for radio astronomy applications. The BEE2 
uses SDF as a unified computation model for both 
the microprocessor and the reconfigurable fabric. It 
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uses a high-level block diagram design environment 
based on The Mathworks' Simulink and the Xilinx 
System Generator (XSG). This design environment, 
however, does not expose the underlying dataflow 
model. In particular, the designer has little or no 
scope to make use of the underlying dataflow model 
for experimentation (as mentioned earlier in Sec- 
tion 1). Also, the SDF model used for program- 
ming the BEE2 is a static dataflow model in that all 
the dataflow information is available at compile-time 
(i.e., before executing or running the application). 
Though this feature provides maximal compilc-timc 
predictability, it has limited expressive power. It 
does not allow for data-dependent, dynamic behav- 
ior, which is exhibited by many modern DSP ap- 
plications, such as the TDD application introduced 
in Section 2 (see [Bhattacharyya et al., 2010] for 
more examples of such applications). Other forms 
of dataflow models that can capture more applica- 
tion dynamics with acceptable levels of compile-time 
predictability may better exploit the features offered 
by platforms such as the BEE2. We should, how- 
ever, mention that the CASPER DSP library offers 
a software register block that can provide limited pa- 
rameterization in the design. We have used this block 
extensively in our TDD design. 

There arc some other FPGA design solutions 
and tool flows available (e.g., those from Nallat- 
ech [Nallatech website], and Lyrtech [Lyrtech web- 
site]). These, however, are commercial tools and do 
not provide open-source DSP software libraries like 
the CASPER. Also, CASPER tools support most of 
the Xilinx FPGA devices unlike these other commer- 
cial tools. 

Model based approaches for designing large scale 

signal processing systems with a focus on radio tele- 
scopes have been previously studied (e.g., see [Alliot 
and Deprettere, 2004; Lemaitre and Deprettere, 2006; 
Lem,aitre, 2008]). Several frameworks have been pro- 
posed for model based, high-level abstractions of ar- 
chitectures along with performance/cost estimation 
methods to guide the designer throughout the de- 
velopment cycle (see [Alliot and Deprettere, 2004]). 
However, the focus of these approaches has been on 
architecture exploration. There have also been at- 
tempts to derive implementation-level specifications 
starting from system-level specifications by segregat- 
ing signal processing and control flow (see [Lee and 
Seshia, 2011] for more information on control flow) 
into an application specification and architecture 
specification, respectively (see [Lemaitre and De- 



prettere, 2006; Lemaitre, 2008]). However, the choice 
of models of computation has been made primarily 
from control flow considerations rather than dataflow 
considerations. These approaches, though relevant, 
do not specifically address the issue of high-level ap- 
plication specification for platform-independent pro- 
totyping and use of models of computation for ab- 
straction of heterogeneous or hybrid dataflow behav- 
iors. This issue is critical to efficient prototyping 
of high performance signal processing applications, 
which are typically dataflow dominated, and include 
increasing levels of dynamic dataflow behavior (e.g., 
see [Bhattacharyya et al., 2010]). 

We address this issue using the CFDF model with 
underlying PSDF or PCSDF behavior and using it 
for system prototyping. We then show how platform- 
independent specifications based on this modeling 
technique can be used to efficiently develop platform- 
specific implementations. 

4. Dataflow-based Design and Implementation 
of a TDD 

We propose an approach for design and implemen- 
tation of a TDD based on the dataflow formalisms 
discussed in Section 3.1 along with relevant capabil- 
ities of the DIF tool described in Section 3.2. Fig. 9 
gives an overview of our dataflow based approach, 
which we now describe. 

4.1. Modeling and Prototyping using DIF 

We start with an application speciflcation that de- 
scribes the DSP algorithm midcr consideration (in 
this case, the TDD) along with proper input and 
output interfaces. The application is specifled using 
the DIF language. This DIF specification consists of 
topological information about the datafiow graph — 
interconnections between the actors along with in- 
put and output interfaces. The DIF specification is 
a platform-independent, high-level application spec- 
ification. The specification can be used, for example, 
to simulate the application, given the library of ac- 
tors from which the specification is constructed. 

Depending upon the application under consider- 
ation, the designer can select among a variety of 
datafiow models of computation in DIF to effectively 
capture relevant aspects of the application dynamics. 
It should be noted that the designer does not always 
need to specify the model in advance. The CFDF 
model can be used to describe individual modules 
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(actors) in the application, and the DIF package can 
analyze the CFDF representation (CFDF modes, to 
be specific) of the actors, as specified by the designer 
through the actor code, and annotate the actors with 
additional dataflow information using various tech- 
niques for identifying specialized forms of dataflow 
behavior (e.g., see [Plishker et al., 2010]). This step 
requires the functionality of individual actors to be 
specified in CFDF semantics. The designer can use 
the existing blocks from the Java actor library in DIF 
or develop his or her own library of CFDF actors. 

In terms of timability, the key components of the 
TDD as seen from Fig. 3 are the tunable FIR fil- 
ter, and decimation filter blocks. The tunable dec- 
imation filter (TDF) block is of particular interest, 
considering that it is the only multirate block in the 
system. Its behavior resembles that of the one de- 
scribed in Section 3.1.3. In view of this, we have 
identified PSDF and PCSDF as candidate dataflow 
models for eflicient implementation of the targeted 
TDD system. For this system, we have to take into 
account the multiple inputs and outputs to actors, 
as mentioned in Section 2. 

To illustrate details of the dataflow behavior of 
a decimator actor based on such specifications, we 
have shown one such decimator actor with 4 inputs 
and outputs, and having a decimation factor of 6 in 
Fig. 10(a) and Fig. 10(b). The decimator simul- 
taneously receives 4 consecutive samples from its 4 
inputs. It outputs every sixth input sample starting 
with the first input sample. Each of these output 
samples appears on a successive output of the deci- 
mator. 

For the sake of simplicity and clarity, we have ex- 
cluded the other single rate blocks from the applica- 
tion graphs in these figiires. In our implementation, 
we extend this behavior for an actor with 8 inputs 
and outputs. We have created a DIF prototype using 
PSDF and PCSDF as underlying models for equiva- 
lent CFDF representation of actor blocks. We have 
also developed a Java library of actors in DIF adher- 
ing to CFDF semantics for all of the blocks. 

We then used DIF for software prototyping, anal- 
ysis, and functional simulation. The DIF package 
uses the DIF specification to generate an intermedi- 
ate graph representation, which can then be used as 
an input for further graph transformations includ- 
ing a scheduling transformation, which determines 
the schedule for an application. Here, by a sched- 
ule, we mean the assignment of actors to process- 



ing resources, and the execution ordering of actors 
that share the same resource. The functional simula- 
tion capabilities provided in DIF can be used to ana- 
lyze and estimate buffer requirements in terms of the 
numbers of tokens accumulated on the buffers that 
correspond to dataflow graph edges. This provides 
an estimate of total memory requirements as well as 
specifications for individual buffers when porting the 
application to the targeted implementation platform. 

Fig. 11 shows the TDD application graph gen- 
erated using DIF. This is based on the TDD block 
diagram shown in Fig. 3 with addition of some ac- 
tors that handle parameter configuration for the ac- 
tors. We discard one of the two sets of outputs (more 
specifically, sine output) of the localOsc actor as we 
have employed a real mixer in our design. The com- 
plexity of the graph, which is increased due to mul- 
tiple parallel edges between two actors, can easily be 
captured through a DIF specification that makes use 
of topological patterns. We have shown one of the 
possible specifications of the graph topology in DIF 
using topological patterns in Fig. 12. 

For our design, we have used parameterized looped 
schedules (PLSs) [Ko et al., 2007] for PSDF and 
PCSDF models to determine the total buffer require- 
ments. Using the TDD specification, we construct 
PLSs for the TDD application. Fig. 13(a) shows a 
PLS for a TDD application, where the decimator ac- 
tor has the underlying SDF model, while Fig. 13(b) 
shows one in which the decimator actor employs the 
CSDF model. We have used the generalized schedule 
tree {GST} representation for the PLSs [Ko et al., 
2007]. An internal node of a GST denotes a loop 
count, while a leaf node represents an actor. The ex- 
ecution of a schedule involves traversing the GST in 
a depth-first manner, and during this traversal, the 
sub-schedule rooted at any internal node is executed 
as many times as specified by the loop count of that 
node. As annotated in these GSTs, loop counts pO, 
pi, and p2 are parameterizable. The loop count pO 
is set to a user-specified number of iterations, while 
the loop counts pi and p2 are tuned based upon the 
decimation factor as well as the underlying dataflow 
model for the decimator. Fig. 13(a) and (b), in 
particular, show values of the parameterizable loop 
counts set for a decimator with a decimation factor 
of 11. This PLS can be viewed as providing CFDF- 
bascd execution for the given PDF-based actor spec- 
ification model. 

Table 1 shows the total buffer requirements us- 
ing PLSs shown in Fig. 13(a) and (b) for various 
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configurations of decimation factors. Note that for 
a given configuration (setting of graph parameters), 
a PSDF or PCSDF graph behaves hkc an SDF or 
CSDF graph, respectively. It can be seen that for 
the SDF model, the total buffer requirements vary 
with the decimation factor, and this is due to in- 
put buffers to the TDD block that need to accumu- 
late varying numbers of tokens. Thus, employing the 
PSDF model will require tuning buffer sizes for dif- 
ferent decimation factors if one wants to provide for 
optimized buffer sizes in terms of graph parameters. 

We have used the CASPER tool flow for devel- 
oping our platform-specific implementation as ex- 
plained later in Section 4.2. This implementation 
is targeted to an FPGA. Our objective here is to 
support tuning the decimation factor without regen- 
erating hardware code. A dataflow buffer can be im- 
plemented using a FIFO or dual-port random access 
memory (RAM) block in the targeted FPGA device. 
The size of the available FIFO block can be set to 
2", where n > 1. This gives limited control over set- 
ting the FIFO size, and may increase the resource 
utilization. At the same time, tuning the sizes of 
FIFO or dual-port RAM blocks is not possible dur- 
ing run-time. It is in general possible to set the size 
of a FIFO or dual-port RAM block to a maximum re- 
quired value, and access only a part of it using a tun- 
able address counter during run-time. This, however, 
again may lead to unnecessary increased resource uti- 
lization. The ADC output is of a streaming nature 
(data is produced or consumed at every clock cycle 
without any synchronization signal), as is the DSP 
subsystem downstream of the TDD. 

In order to achieve the throughput constraint im- 
posed by the maximum data rate of the ADC out- 
put stream, SDF buffers need to be pipelined, which 
is not efficient using RAM blocks. Thus, we use 
the CSDF model, which does not require tuning 
of dataflow buffer sizes to achieve the maximum 
throughput constraint, as observed from our DIF- 
based prototype. The TDD generates a synchroniza- 
tion or enable signal indicating a valid output data. 
This can be used as a clock to drive the downstream 
DSP system. 

We use our DIF prototype as a reference while in- 
tegrating the design with the current CASPER tool 
flow for the target implementation on the IBOB. Sec- 
tion 4.2 further elaborates on this approach along 
with implementation results. 



4.2. Integration with the CASPER Tool Flow 

The CASPER tool flow is based on the BEE_XPS 
tool flow [Parsons et al., 2006]. This tool flow re- 
quires that an application be specified as a Simulink 
model using XSG [Parsons et al., 2006]. Since 
there is no automated tool for transforming a DIF 
representation into an equivalent Simiilink model, 
porting the DIF specification to Simulink/XSG re- 
quires manual transcoding of the DIF specification. 
This also requires implementing parameterizable ac- 
tor blocks that are currently not available in the 
XSG, CASPER, or BEEJCPS libraries. 

Each actor gets transformed into an equivalent 
functional XSG block. For each of the Simulink actor 
blocks, we provide a pre-synthesis parameterization 
that allows changing block parameters before hard- 
ware synthesis (see [Parsons et al., 2007] for more 
details on Simulink scripting). In order to implement 
our objective of tunability — post-synthesis param- 
eterization we use the software register mecha- 
nism in the BEE_XPS library to specify parameters 
that change during run-time (that is, after hardware 
code is generated, and depending upon user require- 
ments.) 

Software registers can be accessed and set during 
run-time from the TinyShell interface available for 
IBOB. This allows tuning TDD parameters without 
re-synthesizing the hardware each time the parame- 
ters change from the previous setting. Each block has 
an enable input signal. Through systematic trans- 
formations, an application graph in DIF can be con- 
verted into an equivalent Simulink/XSG model. We 
have developed an interface software package using 
C programs, and Bash and Python scripts to com- 
pute software register values for the required TDD 
configuration, and set these values on the IBOB over 
a telnet connection, which is used for remote access 
to the hardware platform at NRAO. 

On the targeted FPGA device, we have employed 
the NCO using dual-port RAM blocks that are 
loaded with pre-computed sinusoidal signal values of 
the required precision. Each of these dual-port RAM 
blocks is used to simultaneously read sine and co- 
sine values from both of its ports. The oscillator fre- 
quency is set using a software register, and depends 
upon the desired output signal band. 

In our current implementation, the TDF block 
(see Fig. 3) can have; up to 16 filter taps. We 
have also implemented a tunable FIR filter block, 
which does not decimate, shown in Fig. 3. This 
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block can have up to 8 taps in our implementa- 
tion. These, again, are set using software registers. 
Fig. 4(b) shows the schematic of a TDF. As shown 
in this figure, we have employed two filter banks (16- 
tap units) inside our design of a TDF block that 
operate in tandem to allow maximum throughput 
(that is, the maximum data rate of the ADC out- 
put stream). Hence, our TDF block has 32 multi- 
plication operations. As mentioned earlier, our TDF 
design employs a polyphase implementation as de- 
scribed in [Vaidyanathan, 1990]. The software com- 
putes the scqiicncc in which the input signals should 
be routed to an appropriate filter tap for a given dec- 
imation factor. This information is then fed to the 
signal routing scheme using software registers. 

Table 2 shows results for the TDD implementa- 
tion on the IBOB using the Xilinx EDK 7.1.2. We 
have used this hardware platform and tool for all of 
the experiments reported in the remainder of the pa- 
per. Design 1 shows some of the device utilization 
parameters for a TDD that supports only baseband 
modes. This design does not include the tunable 
FIR filter, NCO, and mixer blocks shown in Fig. 3. 
Design 2 is based on the block diagram of a TDD 
shown in Fig. 3. As evaluation metrics for hardware 
cost, we have used the utilization of FPGA slices, 4- 
input look-up tables (LUTs), and block RAAf units, 
and the number of embedded multipliers. Note that 
neither of these two designs use any of the available 
embedded multipliers for multiplication. Designs 3 
and 4 are modified versions of designs 1 and 2, re- 
spectively, in that they employ embedded 18 x 18 
multipliers. It can be seen that using embedded mul- 
tipliers does not provide significant improvements in 
hardware cost. We observe that use of embedded 
multipliers, in fact, needs to be accompanied by ad- 
dition of extra latency in the design to achieve tim- 
ing closure. We have been able to achieve maximum 
throughput using an implementation based on the 
PCSDF model. 

4.3. Platform-specific Analysis using DIF 

It is common to go back and forth between a high- 
level prototype and a corresponding platform-specific 
implementation while designing an embedded DSP 
system. Such alternation in design phases is com- 
mon, for example, when one is developing a platform- 
specific library or tool flow. In support of such a de- 
sign methodology, it is desirable for a high-level de- 
sign tool to support platform-specific analysis. This 



can be achieved by annotating the high-level appli- 
cation specification with platform-specific implemen- 
tation parameters, which are derived through device 
data sheets, experimentation or some combination of 
both. 

DIF supports specifying user-defined actor param- 
eters. We use this feature in DIF to annotate ac- 
tors with two relevant implementation parameters 
— the latency constraint, and number of embedded 
multipliers. This allows estimating results based on 
the DIF prototype itself instead of determining them 
from the constructed design, which is generally time 
consuming. We have verified the accuracy of metrics 
estimated by our DIF model compared with actual 
hardware synthesis results that are shown in Table 2. 

Developers of tool flows and DSP libraries can pro- 
file their library blocks to determine a wide variety of 
platform-specific implementation parameters. DIF 
can use such information to estimate implementa- 
tion parameters at a high-level of abstraction, and 
earlier in the design cycle to help efficiently prune 
segments of the design space. Support for estima- 
tion of various platform-specific resources for differ- 
ent platforms is beyond the scope of this paper. It 
is, however, an important direction toward develop- 
ing alternative model based design flows and open 
access tool flows for astronomical DSP solutions. 

4.4. Exploring Implementation Trade-offs between 
TDD and FDD Designs 

One of the motivations for the work presented in 
this paper has been to develop library blocks needed 
for a TDD using Xilinx LogicCore and CASPER li- 
brary blocks. The current CASPER DSP library 
provides a decimator (see Fig. 4(a)) that supports 
decimation factors that arc powers of 2. The decima- 
tion factor as well as the filter coefficients of the FIR 
filter are not tunable after the hardware code is gen- 
erated. Our design provides flexibility with not only 
the decimation factor but also the filter coefficients 
through the use of software registers, as explained 
earlier. The FDD designs, though not tunable, have 
lower hardware cost in terms of device utilization. 
Table 3 provides a summary of some of the hardware 
utilization parameters for the FDD designs. These 
designs have also been implemented on a CASPER 
IBOB. The decimation factor of 10 has been achieved 
by first interpolating the input by a factor of 80, and 
then decimating it by a factor of 8. Comparison be- 
tween the results in this table and those in Table 2 
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clearly highlights the trade-off between design flex- 
ibility and hardware cost. Using the model-based 
approach presented in this paper, the designer can 
effectively explore this trade-off based on the given 
design requirements. 

4.5. TDD and FDD for Multistage Downconversion 

Though our TDD design supports limited decima- 
tion factors (integer factors between 5 and 12), its 
usage is not limited to these factors. It can be read- 
ily scaled and applied to achieve other decimation 
factors by cascading multiple TDF blocks. Fig. 14 
shows some of the possible input/output sampling 
rate relations that can be achieved by such use of 
cascaded TDF blocks. Design 1 in Table 4 employs 
cascaded TDF blocks, while design 2 in Table 4 em- 
ploys cascaded fixed-configiu-ation decimation filter 
(FDF) blocks. Both of those designs have been de- 
veloped to demonstrate multistage downconversion 
for a baseband signal and hence, do not employ mix- 
ers. It is possible to extend these designs to include a 
mixer to allow all possible narrow band outputs and 
not just the baseband output. For all of the designs 
in this table that use one or more TDF blocks, the 
TDF block employs dedicated embedded multipliers. 

In this light, we further explore the trade-off be- 
tween the low hardware cost of FDD designs and flex- 
ibility offered by TDD designs by examining a design 
consisting of an FDF block followed by a TDF block 
(designs 3 and 4 in Table 4). These designs provide 
limited tunable decimation factors compared to de- 
sign 1, but also have lower hardware cost in terms of 
device utilization. 

5. Summary and Conclusions 

We have proposed a dataflow-based approach for 
prototyping radio astronomy DSP systems. We have 
used a dataflow-based high-lcvcil application model 
that provides a platform-independent speciflcation, 
and assistance in functional verification and impor- 
tant resource estimation tasks. This can prove ef- 
fective in reducing the development cycle and faster 
deployment of DSP systems across various target 
platforms. We have employed this approach to me- 
thodically develop a TDD based DSP backend de- 
sign. Our TDD implementation is targeted to the 
CASPER FPGA board, called IBOB, and supports 
tuning narrow band modes without the need for re- 
generating hardware code. We have also explored 



the trade-off between the low hardware cost for FDD 
designs and the flexibility offered by TDD designs. 
This trade-off has also been highlighted in the con- 
text of designs employing a two-stage downconver- 
sion scheme. A designer can explore this design space 
to best meet the application requirements. Expand- 
ing on our work to integrate TDDs with ongoing de- 
velopment of spectrometer designs at the NRAO on 
the latest CASPER hardware is a natural extension 
of the work presented in this paper. 

There is a growing interest in the radio astronomy 
community to have open-access and portable astro- 
nomical signal processing solutions. Currently, this is 
constrained by proprietary commercial tools targeted 
for specific platforms. We have also relied on these 
tools, mainly for hardware synthesis and code gener- 
ation, in our work. In this context, it is of interest 
to have high-level application description languages 
with semantic foundations in models of computation, 
and the corresponding design tools for efficient speci- 
fication, simulation, functional verification, and syn- 
thesis. Developing model based, platform-specific li- 
braries, and devising techniques for automatic code 
generation from high-level representations, such as 
those in DIE, specifically for the radio astronomy do- 
main is an important direction for future research. 

Acknowledgments. This research was sponsored in 
part by the National Radio Astronomy Observatory, Aus- 
trian Marshall Plan Foundation, and National Science 
Foundation (grant AGS-0959761 to New Jersey Institute 
of Technology). We acknowledge with thanks the contri- 
butions of Shilpa BoUineni, Srikanth Bussa, Randy Mc- 
CuUough, Scott Ransom, and Jason Ray of the National 
Radio Astronomy Observatory. The National Radio As- 
tronomy Observatory is a facility of the National Science 
Foundation operated under cooperative agreement by As- 
sociated Universities, Inc. 



SANE ET AL.: DATAFLOW MODELS FOR RADIO ASTRONOMY DSP 



SANE ET AL.: DATAFLOW MODELS FOR RADIO ASTRONOMY DSP 



13 



References 

AUiot, S., and E. Deprettere (2004), Architecture ex- 
ploration of a large scale system, in Proceedings of 
the IEEE International Workshop on Rapid System 
Prototyping^ pp. 217-224, Geneva, Switzerland, doi: 
10.1109/IWRSP.2004.1311120. 

Bhattacharya, B., and S. S. Bhattacharyya (2000), Pa- 
rameterized dataflow modeling of DSP systems, in 
Proceedings of the International Conference on Acous- 
tics, Speech, and Signal Processing, pp. 1948-1951, Is- 
tanbul, Turkey. 

Bhattacharyya, S. S., R. Leupers, and P. Marwedel 
(2000), Software synthesis and code generation for 
DSP, IEEE Transactions on Circuits and Systems — 
II: Analog and Digital Signal Processing, 47 {Q), 849- 
875. 

Bhattacharyya, S. S., E. Deprettere, R. Leupers, and 
J. Takala (Eds.) (2010), Handbook of Signal Processing 
Systems, Springer. 

Bilsen, G., M. Engols, R. Lauwereins, and J. A. Peper- 
straete (1996), Cyclo-static dataflow, IEEE Transac- 
tions on Signal Processing, 44 397-408. 

Buck, J. T. (1993), Scheduling dynamic dataflow graphs 
with bounded memory using the token flow model, 
Ph.D. thesis, EECS Department, University of Cali- 
fornia, Berkeley. 

CASPER Website (), Collaboration for astron- 
omy signal processing and electronics research, 
http:/ /casper. bcrkeley.edu. 

Chang, C, J. Wawrzynek, and R. W. Brodersen (2005), 
BEE2: a high-end reconflgurable computing system. 
Design & Test of Computers, IEEE, 22(2), 114-125, 
doi: 10. 1 109 /MDT.2005 .30. 

Eker, J., and J. W. Janneck (2003), CAL language report, 
language version 1.0 — document edition 1, Tech. Rep. 
UCB/ERL M03/48, Electronics Research Laboratory, 
University of California at Berkeley. 

Ford, J., and J. Ray (2010), An application of high- 
performance reconflgurable computing in radio as- 
tronomy signal processing, in High-Performance Re- 
configurable Computing Technology and Applications 
(HPRCTA), Fourth International Workshop on, pp. 
1-7, doi:10.1109/HPRCTA.2010.5670794. 

Haubelt, C, J. Falk, J. Keinert, T. Schlichter, 
M. Streubhr, A. Deyhlc, A. Hadert, and J. Teich 
(2007), A SystcmC-based design methodology for dig- 
ital signal processing systems, EURASIP Journal on 
Embedded Systems, 2007, Article ID 47,580, 22 pages. 

Hsu, C, M. Ko, and S. S. Bhattacharyya (2005), Soft- 
ware synthesis from the dataflow interchange format, 
in Proceedings of the International Workshop on Soft- 
ware and Compilers for Embedded Systems, pp. 37-49, 
Dallas, Texas. 

Hsu, C, I. Corretjer, M. Ko., W. Phshkcr, and S. S. 
Bhattacharyya (2007), Dataflow interchange format: 
Language reference for DIF language version 1.0, 
user's guide for DIF package version 1.0, Tech. Rep. 
UMIACS-TR-2007-32, Institute for Advanced Com- 
puter Studies, University of Maryland at College Park, 
also Computer Science Technical Report CS-TR-4871. 



Johnson, G. (1997), LabVIEW Graphical Programming: 
Practical Applications in Instrumentation and Con- 
trol, McGraw-Hill School Education Group. 

Ko, M., C. Zissulcscu, S. Puthenpurayil, S. S. Bhat- 
tacharyya, B. Kienhuis, and E. Deprettere (2007), Pa- 
rameterized looped schedules for compact represen- 
tation of execution sequences in DSP hardware and 
software implementation, IEEE Transactions on Sig- 
nal Processing, 55(6), 3126-3138. 

Kwon, S., H. Jung, and S. Ha (2004), H.264 decoder al- 
gorithm speciflcation and simulation in simulink and 
PeaCE, in Proceedings of the International SoC Design 
Conference, pp. 9-12. 

Lee, E. A., and D. G. Mcsscrschmitt (1987), Static 
scheduling of synchronous dataflow programs for digi- 
tal signal processing, IEEE Transactions on Comput- 
ers, C-36{1), 24-35, doi:10.1109/TC.1987.5009446. 

Lee, E. A., and S. A. Seshia (2011), Introduction to Em- 
bedded Systems, A Cyber-Physical Systems Approach, 
http: / /LeeSeshia.org. 

Lemaitre, J. (2008), Model-based speciflcation and de- 
sign of large-scale embedded signal processing systems, 
Ph.D. thesis, Leiden University, The Netherlands. 

Lemaitre, J., and E. Deprettere (2006), FPGA implemen- 
tation of a prototype hierarchical control network for 
Large-Scale signal processing applications, in Proceed- 
ings of the International Euro-Par Conference, Lec- 
ture Notes in Computer Science 4128, pp. 1192-1203, 
Springer, Dresden, Germany. 

Lyrtech website (), Lyrtech, http://www.lyrtech.com. 

McAllister, J., R. Woods, R. Walke, and D. Reilly (2004), 
Synthesis and high level optimisation of multidimen- 
sional dataflow actor networks on FPGA, in Proceed- 
ings of the IEEE Workshop on Signal Processing Sys- 
tems. 

Murthy, P. K., and E. A. Lee (2002), Multidimensional 
synchronous dataflow, IEEE Transactions on Signal 
Processing, 50 (8), 2064-2079. 

Nallatech website (), Nallatech, 

http://www.nallatech.com. 

Parks, T. M., J. L. Pino, and E. A. Lee (1995), A 
comparison of synchronous and cyclo-static dataflow, 
in Proceedings of the IEEE Asilomar Confer- 
ence on Signals, Systems, and Computers, vol. 1, 
pp. 204-210 vol.1, Pacific Grove, California, doi: 
10.1109/ACSSC.1995.540541. 

Parsons, A., et al. (2005), A new approach to radio as- 
tronomy signal processing, in Proceedings of the Gen- 
eral Assembly of the International Union of Radio Sci- 
ence. 

Parsons, A., et al. (2006), PetaOp/Second FPGA signal 
processing for SETI and radio astronomy, in Proceed- 
ings of the IEEE Asilomar Conference on Signals, Sys- 
tems, and Computers, pp. 2031-2035, Pacific Grove, 
Cahfornia, doi:10.1109/ACSSC.2006.355123, invited 
paper. 

Parsons, A., D. Chapman, and H. Chen (2007), Xil- 
inx system generator for DSP in the CASPER group. 
Tech. Rep. CASPER Memo 11, Center for Astronomy 
Signal Processing and Electronic Research, University 
of California, Berkeley. 

Pino, J. L., S. Ha, E. A. Lee, and J. T. Buck (1995), 
Software synthesis for DSP using Ptolemy, Journal of 
VLSI Signal Processing, 9{1). 



14 



SANE ET AL.: DATAFLOW MODELS FOR RADIO ASTRONOMY DSP 



Plishker, W., N. Sane, M. Kiomb, K. Anand, and S. S. 
Bhattacharyya (2008), Functional DIF for rapid proto- 
typing, in Proceedings of the International Symposium 
on Rapid System Prototyping, pp. 17-23, Monterey, 
California. 

Plishker, W., ct al. (2010), Model-based DSP imple- 
mentation on FPGAs, in Proceedings of the Interna- 
tional Symposium on Rapid System Prototyping, Fair- 
fax, Virginia, invited paper. 

Saha, S., S. Puthenpurayil, and S. S. Bhattacharyya 
(2006), Dataflow transformations in high-level DSP 
system design, in Proceedings of the International 
Symposium on System-on-Chip, pp. 131-136, Tam- 
pere, Finland, invited paper. 

Same, N., H. Kee, G. Seetharaman, and S. S. Bhat- 
tacharyya (2010), Scalable representation of dataflow 
graph structures using topological patterns, in Pro- 
ceedings of the IEEE Workshop on Signal Processing 
Systems, San Francisco Bay Area, USA. 

Sane, N., H. Kcc, G. Seetharaman, and S. Bhattacharyya 
(2011), Topological patterns for scalable representa- 
tion and analysis of dataflow graphs. Journal of Signal 
Processing Systems, 65, 229-244, 10.1007/sll265-011- 
0610-1. 

Shen, C., and S. S. Bhattacharyya (2009), System- 
level clustering and timing analysis for GALS-based 
dataflow architectures, in Proceedings of the ACM In- 
ternational Workshop on Timing Issues in the Spec- 
ification and Synthesis of Digital Systems, Austin, 
Texas. 

Stcfanov, T., C. Zissulescu, A. Turjan, B. Kienhuis, and 
E. Deprettere (2004), System design using Kahn pro- 
cess networks: the Compaan/Laura approach, in Pro- 
ceedings of the Design, Automation and Test in Eu- 
rope Conference and Exhibition, vol. 1, pp. 340-345, 
doi: 10. 1 109/DATE.2004. 1268870. 



Suhaib, S., D. Mathaikutty, and S. Shukla (2008), 
Dataflow architectures for GALS, Electronic Notes 
in Theoretical Computer Science, 200, 33-50, doi: 
10.1016/j.entcs.2008.02.005. 

Szomoru, A. (2011), The UniBoard: A multi-purpose 
scalable high-performance computing platform for 
radio-astronomical applications, in Ceneral Assembly 
and Scientific Symposium, 2011 XXXth URSI, pp. 1- 
4, doi:10.1109/URSIGASS.2011.6051281. 

Thies, W., M. Karczmarek, and S. Amarasinghe (2002), 
Streamit: A language for streaming applications, in 
International Conference on Compiler Construction, 
Grenoble, France. 

Vaidyanathan, P. (1990), Multirate digital fllters, filter 
banks, polyphase networks, and applications: a tu- 
torial. Proceedings of the IEEE, 78{1), 56-93, doi: 
10.1109/5.52200. 



S. S. Bhattacharyya, Department of Electrical and 
Computer Engineering, and Institute for Advanced Com- 
puter Studies, University of Maryland, College Park, 
MD, 20742, USA. (ssb@umd.edu) 

J. Ford, National Radio Astronomy Observatory, 
Green Bank, WV, 24944, USA. (jford@nrao.edu) 

A. I. Harris, Department of Astronomy, University 
of Maryland, College Park, MD, 20742, USA. (har- 
ris@astro.umd.edu) 

N. Sane, Department of Physics, and Center for Solar- 
Terrestrial Research, New Jersey Institute of Technology, 
Newark, NJ, 07102, USA. (nimish.sane@njit.edu) 

(Received ) 



SANE ET AL.: DATAFLOW MODELS FOR RADIO ASTRONOMY DSP 



nTrf^&^ r© 

(a) 



0^ — — s 
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Figure 1. An application graph with a simple decimator 
actor M using the (a) CSDF, and (b) SDF models. Actor 
M is a decimator with a decimation factor of 4. 



FIR Filter (Cn) 



>( Decimator > 
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DF.subinit 
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(b) 

Figure 2. Modeling a parameterized decimation filter 
(DF) application using PCSDF: (a) Application graph — 
Cn denotes a vector of FIR filter coefficients, and D de- 
notes a decimation factor, and (b) PCSDF representa- 
tion. 
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Figure 3. Block diagram of a tunable digital downconverter. 
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y[n] 



^ 

Decimation Factor and 
Signal routing sequence 
(Software Registers) 




(b) 

Figure 4. Schematic of (a) fixed-configuration decima- 
tion filter (FDF) in the CASPER library, and (b) tunable 
decimation filter (TDF) that is part of a TDD. The FDF 

achieves downconversion of 8 by having 8 parallel inputs 
x[n], x[n — 1], . . . , x[n — 7]. Here, ftO, ftl, /i7 denote 
the filter coefficients, and y[n] denotes the output. For 
TDF, 16-tap units are similar to the structure inside the 
dotted box shown in (a) with tunable filter taps. The 
TDF block has 8 inputs as well as 8 outputs. 
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160 



(a) 



800 f (MHz) 



320 640 800 f (MHz) 
(b) 



Figure 5. Two of the possible eonfigurations of a TDD: 
(a) B» = 160 MHz, C/ = 80 MHz (b) B^, = 320 MHz, 
Cf = 480 MHz. The colored area shows the extracted 
frequency band. 




[dataf loBHodel] grapliID {. 
bas@doii { 
graphID; 

> 

[topology] { 

nodes - nodalD, . . . ; 

edges - e<^eID(srcNodeID, siLkNodelD), 

> 

[built InAttribute] { 
elementlD = value; 
GlementlD - id; 
olementlD - idl, ld2, ...; 



[attribute] nserDef InedAttribute { 
elementlD = value; 
olementlD - id; 
elementlD - idl , id2 , . . . ; 

y 



Figure 7. The DIF language. 
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Figure 8. The DIF Package. 



Modeling 
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Porting 
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using tlie DIF language 



Actor library in DIF 
using CFDF semantics 



DIF prototype for analysis and 
functional verification 



Platform-specific optimizations 
and trans-coding 



Library of 
Simulinl( Blocits 



Final hardware 
Implementation implementation and testing 



Figure 9. Dataflow-based approach for design and implementation of a TDD. 
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Figure 11. TDD application graph generated using DIF. 
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topology { 

nodes = source, copy, bpf. Merge, decimator, sink, 

control, fork [3] , multiplier, localDac, dump; 

edges = c0[6] -> chain (source, copy, bpff 

multiplier. Merge, decimator, sink), 

cl[6] -> chain (source, copy, bpf, 

multiplier. Merge, decimator, sink), 

c2[6] -> chain (source , copy, bpf, 

nHiltiplier, Kerge, decimator, sink), 

c3[6] -> chain (source, copy, bpf, 

multiplier. Merge, decimator, sink), 

c4[6] -> chain(source , copy, bpf, 

multiplier, Merge, decimator, sink), 

c5[6] -> chain (source, copy, bpf, 

multiplier, Merge, decimator, sink), 

c6[6] -> chainCsource, copy, bpf, 

multiplier. Merge, decimator, sink), 

c7[6] -> chain (source , copy, bpf, 

multiplier, Merge, decimator, sink), 

cpMrg[8] -> mult ledge (copy. Merge), 

loMul[8] -> multiedge (localDsc , multiplier), 

loDuEip[8] -> mu.ltiedge(localOsc, dump), 

conFrk, fOfi, flf2 -> chain (control, fork [0:2]), 

fOBpf, flLo, f2Krg, f2Dec -> parallel(f ork [0 : 2J , 

fork [2], bpf, localOsc, Merge, decimator); 

} 

Figure 12. Partial DIF specification — topology block 
— for the TDD application graph using topological pat- 
terns. 
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fork_0_ ) ( fork_l_ ) f bpf j ( fork_2_ ) ( localOsc ) ( Merge ) ( decimator ) ( pi = 1 



p2= 11 




(b) 



Figure 13. PLSs for the TDD application configured for 
a decimation factor of 11, and decimator actor employing 
the (a) PSDF and (b) PCSDF models of computation. 
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Figure 14. Two-stage 



digital downconversion. 
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Table 1. Total buffer requirements from a DIF prototype 
for different decimation factors using parameterized looped 
schedules. 



Decimation Factor 




5 


6 


7 


8 


9 


10 


11 


12 


Total buffer requirements 


SDF 


132 


140 


148 


156 


164 


172 


180 


188 


(Number of tokens) 


CSDF 


100 


100 


100 


100 


100 


100 


100 


100 



Table 2. Implementation summary for TDD designs. In 
all the designs below, the input bandwidth is 800 MHz, and 
decimation fax;tor, D, is tunable such that 5 < £> < 12. 



Parameter 



Design 1 Design 2 Design 3 Design 4 



Mixer No Yes No Yes 

Latency (ns) 65 150 85 190 

FPGA slices (Out of 23616) 12234 (52%) 13315 (56%) 12322 (52%) 14232 (60%) 

4 input LUTs (Out of 47232) 14139 (29%) 16123 (34%) 12123 (25%) 15035 (31%) 
Block RAMs (Out of 232) 41 (17%) 48 (20%) 41 (17%) 48 (20%) 

18 X 18 Multipliers (Out of 232) — — 32 (13%) 95 (40%) 



Table 3. Implementation summary for FDD designs. In all the designs below, the input bandwidth is 800 MHz. 



Parameter 


Design 1 


Design 2 


Design 3 


Design 4 


Mixer 


No 


No 


Yes 


Yes 


Decimation factor 


8 


10 


8 


10 


(MHz) 


100 


80 


100 


80 


Cf (MHz) 


50 


40 


400 


400 


Latency (ns) 


35 


440 


50 


455 


FPGA slices (Out of 23616) 


4175 (17%) 


6142 (26%) 


5690 (24%) 


6439 (27%) 


4 input LUTs (Out of 47232) 


5153 (10%) 


5216 (11%) 


5984 (12%) 


6003 (12%) 


Block RAMs (Out of 232) 


41 (17%) 


41 (17%) 


49 (21%) 


49 (21%) 


18 X 18 Multipliers (Out of 232) 


8 (3%) 


8 (3%) 


32 (13%) 


32 (13%) 



Table 4. Implementation summary for designs employing 
two-stage downconversion using cascaded FDF or TDF blocks. 
In all the designs below, the input bandwidth is 800 MHz. 



None of these designs employs a mixer 


block. 




Parameter 


Design 1 


Design 2 


Design 3 


Design 4 


No. of FDF blocks 





2 


1 


1 


No. of TDF blocks 


2 





1 


1 


FDF Decimation factor (s) 




8, 10 


8 


10 


(MHz)** 


Tunable 


10 


Tunable 


Tunable 




(< 800) 




(< 100) 


(< 80) 


Latency (ns) 


170 


475 


120 


505 


FPGA slices (Out of 23616) 


17141 (72%) 


5765 (24%) 


11073 (46%) 


12641 (53%) 


4 input LUTs (Out of 47232) 


19718 (41%) 


5506 (11%) 


12245 (25%) 


12310 (26%) 


Block RAMs (Out of 232) 


41 (17%) 


41 (17%) 


41 (17%) 


41 (17%) 


18 X 18 Multipliers (Out of 232) 


64 (27%) 


16 (6%) 


40 (17%) 


40 (17%) 



** Bw, if tunable, can be tuned to frequencies consistent with decimation factors supported by the TDD block. 



